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Many standard structural quantities, such as order parameters and correlation functions, exist for 
common condensed matter systems, such as spherical and rod-like particles. However, these struc- 
tural quantities are often insufficient for characterizing the unique and highly complex structures 
often encountered in the emerging field of nano and microscale self-assembly, or other disciplines 
involving complex structures such as computational biology Computer science algorithms known as 
"shape matching" methods pose a unique solution to this problem by providing robust metrics for 
quantifying the similarity between pairs of arbitrarily complex structures. This pairwise matching 
operation, either implicitly or explicitly, lies at the heart of most standard structural character- 
ization schemes for particle systems. By substituting more robust "shape descriptors" into these 
schemes we extend their applicability to structures formed from more complex building blocks. Here, 
we describe several structural characterization schemes and shape descriptors that can be used to 
obtain various types of structural information about particle systems. We demonstrate the appli- 
cation of shape matching algorithms to a variety of example problems, for topics including local 
and global structure identification and classification, automated phase diagram mapping, and the 
construction of spatial and temporal correlation functions. The methods are applicable to a wide 
range of systems, both simulated and experimental, provided particle positions are known or can be 
accurately imaged. 



It has been long recognized that in condensed matter 
systems there exists a strong connection between ther- 
modynamics and particle packing Additionally, 
the spatial arrangement of particles in a given phase de- 
termines its mechanical, chemical, electrical, and optical 
properties. As a result, it is natural to attempt to gain in- 
sight into systems in general by characterizing and mon- 
itoring both global and local structure. In standard con- 
densed systems, this is typically achieved by constructing 
order parameters or correlation functions that are sensi- 
tive to the way particles are arranged. Several order pa- 
rameters and correlation functions have been contrived 
for standard classes of condensed matter, including, e.g., 
systems of rod-like and spherical particles [IHH]. Stan- 
dard examples include the P2 nematic order parameter 
for rod-like liquid crystalline systems and the bond order 
parameters of Nelson and coworkers for detecting crys- 
talline ordering in systems of spherical particles ^ |lOj . 
These types of standard metrics have found widespread 
use in both computational condensed matter physics as 
well as the colloidal sciences, where standard systems of 
rod-like or spherical particles are often studied. 

In the emerging field of colloidal and nanoscale self- 
assembly, unique building blocks can form assembled 
morphologies that often deviate from those expected in 
traditional condensed matter systems pTHT4l . Beyond 
the self-assembly of spherical [TS'.'TF and rod-like [17^18] 
particles [11, 13 , examples of assembled systems include 
ordered structures formed from polyhedrally shaped [19]- 
[26] or patterned particles [ST-^ , and phase-separated 
domains reminiscent of those formed by block copoly- 
mers and surfactants jTH [22l I32H36] . In these systems 
too, system stability and properties are, in many cases. 



strongly linked to their global structure and local pack- 
ing [13; 15, X8, 37-39J. However, constructing general 
order parameters for assemblies of particles of complex 
shape and interaction anisotropy [TTJ [13] is considerably 
more challenging than for traditional condensed systems, 
where the particle shapes and morphologies are compar- 
atively much simpler. As a result of the increased com- 
plexity and vast design space, there are few "model prob- 
lems" in nanoscale self-assembly for which generally ap- 
plicable order parameters can be defined. This has lead 
many recent studies of assembled systems to rely heavily 
on visual inspection or ad hoc analysis for characterizing 
structures, which are often more time consuming and less 
accurate than mathematical analysis. 

In this article, we address the problem of creat- 
ing general structural metrics for complex colloidal and 
nanoscale assemblies, and other systems with a high de- 
gree of structural complexity. To do so, we combine the 
physical insights underlying many standard condensed 
matter order parameters with the mathematical insights 
provided by the computer science field of "shape match- 
ing." We show that virtually all of the standard struc- 
tural characterization schemes from the more general 
condensed matter literature can be broken down, funda- 
mentally, into the problem of quantifying the degree to 
which structures match. Structural similarity, in turn, 
can be quantified by using robust "shape descriptors" 
from the field of shape matching, which can be applied 
to arbitrarily complex structures. We decompose several 
order parameters, correlation functions, and other stan- 
dard structural characterization schemes into their core 
elements, such that they can be used with arbitrary shape 
descriptors to extend their applicability. Additionally, 
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we introduce new, more abstract structural characteriza- 
tion schemes that can also be used with arbitrary shape 
descriptors, to solve novel problems that arise in com- 
putational studies of self-assembly. The shape matching 
methods that we provide will facilitate the creation of 
new structural metrics that are standardized, improv- 
ing accuracy and comparability, but are also still flexible 
enough to be applied to the new classes of complex struc- 
tures that arise in assembly problems. 

This article is organized as follows. In section [Tj we 
provide an overview of the shape matching framework 
and terminology that we will employ and describe how 
it connects to some standard structural characterization 
schemes from the condensed matter literature. In sec- 
tion [nj we review some relevant shape descriptors from 
the shape matching literature that can be applied to 
assembled systems. In section |III[ we introduce some 
simple "similarity metrics" that can be used together 
with the shape descriptors from secti on [ll| to measure 
structural similarity. Finally, in section |IV[ we introduce 
general algorithms based on shape descriptors and sim- 
ilarity metrics that can be used to obtain various types 
of structural information for complex particle systems. 
To demonstrate the usage of these algorithms, we apply 
shape matching to systems in the fields of nanoscience, 
computational self-assembly and condensed matter. Our 
examples include identifying local and global structures, 
quantifying structural changes as a function of time or a 
control variable, constructing correlation functions, map- 
ping structural phase diagrams, and grouping similar 
structures. We cover a wide range of systems includ- 
ing ordered phases formed from spherical and point-like 
particles, a fluid of tetrahedrally-shaped particles with lo- 
cally ordered motifs self-assembled systems of teth- 
ered nanoparticles with various nanoparticle shapes [40]- 
[42] . patchy colloidal tetrominoes [43^, a helical ribbon 
formed from tethered nanorods [44] , a model protein [45] , 
gold nanowires ^46, ,47^, and small clusters of water 
molecules [48]. The examples that we provide are ap- 
plicable to particle systems in general, provided that 
the particle positions and, in some cases, orientations, 
can be detected. Although not explicitly treated here, 
other data representations such as images or diffraction 
data can also be used to obtain structural metrics within 
the shape matching framework. To aid in the develop- 
ment and dissemination of shape matching techniques, 
we provide accompanying software and examples via the 
web [49]. 



I. SHAPE MATCHING OVERVIEW 

The problem of quantifying how well structures match 
(see Fig. [T]) has been generalized within the context of the 
computer science field of "shape matching" [50]. Famil- 
iar shape matching applications include matching finger- 
prints, signatures [50 , faces [51 , and iris patterns [52] . 
Shape matching schemes have already been applied to 



systems of particles, particularly in the realm of fast 
database searches for proteins and macromolecules [53- 
59 . In some specific cases, shape matching schemes 
have been explicitly applied to local structure identifi- 
cation in problems in condensed matter and nanoscale 
self-assembly [6QH62] . Recently, we generalized the con- 
cept of applying shape matching methods to assembled 
systems [63 , and demonstrated how a particular class 
of harmonic shape descriptors can be applied to a wide 
variety of self-assembly problems [64] . In the present ar- 
ticle, we provide a thorough survey of both the different 
types of shape descriptors and structural characterization 
schemes that can be applied to assembled systems. 

The basic idea of shape matching is to "index" struc- 
tures into mathematical fingerprints known as "shape de- 
scriptors," S, and then compare them using a similarity 
metric M(Si, Sj) to obtain both a quantitative and qual- 
itative measure of similarity between the structures. For 
mathematical simplicity, we constrain our shape descrip- 
tors here to be vectors containing an arbitrary number 
of components, and our similarity metrics to be scalars 
that indicate the degree of correspondence between pairs 
of shape descriptors vectors. Matching can then be per- 
formed using straightforward vector operations, based 
on, e.g., the degree of alignment of or distance between 
shape vectors. Matching information is used to create 
order parameters and correlation functions, or to iden- 
tify structures by comparing "query" structures to "ref- 
erence" structures. Since we can choose virtually any 
structure as a reference, this scheme facilitates the cre- 
ation of highly specific structural metrics. The workflow 
for an application within the shape matching framework 
is shown in Fig. [l] 

To apply these ideas to particle systems, we begin by 
asserting that most standard structural metrics include 
an implicit concept of "matching." That is, an order 
parameter or correlation function typically tells us the 
degree to which a structure of interest matches another 
(often ideal) structure. Most standard structural charac- 
terization schemes implicitly fit within the shape match- 
ing framework, and can be decomposed into query struc- 
tures, reference structures, shape descriptors, and simi- 
larity metrics. 

For example, consider the well-known order parame- 
ter p2 which detects nematic (aligned) liquid crystalline 
ordering: 

A = (P.(cos^)) = (^^^^). (1) 

The function P2 is the second Legendre polynomial [67] 
and 6 is the angle between the axis of the molecule and 
the "local director" d that indicates the preferred direc- 
tion of the overall sample [4 . As the direction deviates 
from the preferred direction, P2 decreases proportionally. 
The global nematic order parameter P2 is obtained by 
computing the ensemble average of P2 , denoted by angle 
brackets. In this scheme, the query structure is the sys- 
tem of particles and the reference structure is an ideal 
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FIG. 1: Data flow diagram for sfiape matcfiing. (a) A rep- 
resentative pattern is extracted for a given particle struc- 
ture and then indexed into a structural fingerprint known 
as a shape descriptor, S. The depicted cluster is an energy- 
minimized quantum Lennard- Jones cluster £5^.66^. (b) Shape 
descriptors are then compared to obtain similarity informa- 
tion M, which can be applied within the context of various 
structural characterization schemes. 



nematic liquid crystal with director d. The shape de- 
scriptor is given by the collection of angles between the 
molecular axes and d and the similarity metric is given 
by the Legendre polynomial P2 . The order parameter P2 
gives an optimal value of 1 when the structure matches 
a perfectly aligned liquid crystal with director d, and 
tends toward zero the more the structure deviates from 
this ideal case. 

As another classical example, consider the hexatic cor- 
relation function for 2d systems of spherical particles, or 
disks HIMI: 



96{r) 



(2) 



1 

;(i) = - Vexp(z6l9^). 



(3) 



The quantity ?/^6 is a "bond order" parameter, defined 



Here, n is the number of atoms in the first neighbor shell 
of an atom i, and d is the direction of neighboring atom 
j. The value of '^6(^)'06(i) approaches when the par- 
ticles i and j are both in hexagonal local environments 
with the same spatial orientation, and varies towards 
otherwise. Thus, like the standard radial distribution 
function g(r\ g^(r) measures the degree of spatial order- 
ing; however, whereas gir) is sensitive to translational or- 
dering generally, g^ir^ is specifically sensitive to aligned 
hexagonal ordering. In this scheme, the query and refer- 
ence structures are the pairs of neighbor shell clusters for 
atoms i and j, which are a distance r apart, the shape 
descriptors are the local values of ^/^e, and the similarity 
metric is the 'dot' operator, which measures the coher- 
ence between two descriptors. Thus, g^(r) measures 
how closely the local environment of a particle at the ori- 
gin matches with that of a particle a distance r away in 
terms of both hexagonal shape and spatial orientation. 

Although P2 and g^{r) are specific schemes, the physi- 
cal insights underlying them are general. Recasting stan- 
dard schemes within the shape matching framework al- 
lows us to obtain the same types of information, but with 
different shape descriptors S and similarity metrics M 
that are better suited for the unique and complex struc- 
tures observed in assembled systems. The latter provides 
the main substance of this article, but first, we introduce 
several shape descriptors and similarity metrics in the 
following sections. 



II. SHAPE DESCRIPTORS 

The shape descriptors that we describe in this sec- 
tion are adapted from the computer science field of 
shape matching. We constrain our discussion here to 
the subset of shape matching methods that we believe 
are most readily applicable to particle systems in both 
two and three dimensions. For a more comprehensive 
review of shape matching methods, see, for example, ref- 
erences [69ti7T] . 

The first step towards creating an order parameter 
within the shape matching framework is to index the 
shapes representing the structure of interest into one or 
more shape descriptors S. For simplicity, we consider in 
our framework shape descriptors to store structural in- 
formation in a vector, which may contain real or complex 
components. However, shape descriptors may take other 
forms. 

In addition to containing structural information, shape 
descriptors may possess other desirable properties and 
contain additional data, which may determine which de- 
scriptor is optimal for a particular application. One im- 
portant property of shape descriptors is "invariance," 
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defined as the ability for the descriptor to remain un- 
changed under certain mathematical transformations, 
such as scaling, translations, or rotations. In the context 
of particle systems, rotation invariance is a highly desir- 
able property, since many applications involve comparing 
structures in a way that is independent of their spatial 
orientation. For descriptors without rotation-invariance, 
alignment or "registration [72, 73 " algorithms must be 
employed prior to matching to remove orientational de- 
pendence. Since particle systems often exhibit thermal 
noise, another desirable property of shape descriptors 
is robustness under small perturbations. However, this 
property must be balanced with the property of sensi- 
tivity, so that descriptors are still capable of detecting 
subtle structural differences. Another important consid- 
eration is the amount of computational time required to 
compute and compare the descriptors, which may vary 
drastically for different schemes. Often, there is a direct 
tradeoff between computational cost and accuracy and 
attention to detail. 

In the following sections, we provide a brief overview 
of some shape descriptors with different combinations of 
these properties that are well suited for self-assembled 
systems of particles. These descriptors are not represen- 
tative of the full realm of possibilities, but rather are 
meant to serve as demonstrative examples. It is impor- 
tant to note that in principle, there is no limit on how 
the shape descriptor is calculated. Here, we constrain 
our analysis to descriptors that can be described mathe- 
matically as a vector, since this simplifies the process of 



writing general similarity metrics in section III However, 
in general, not all descriptors can be represented in this 
way, and thus require different similarity metrics. 



A. Data Representations 



Particle systems are typically represented as either a 
set of points (point cloud data) or solid objects (volu- 
metric data). Both types of data can be represented by a 
set of position vectors {X} = {xi,X2, ...Xn} and weights 
{/} = {/i^ /2, •••/n}. Point cloud data {X} typically rep- 
resents particle positions, in which case the weights fi are 
all 1. For volumetric data, the position vectors x^ rep- 
resent the location of voxels (n-dimensional pixels) with 
intensities given by fi. There is no formal rule regarding 
how to best represent input data for a given system. In 
general, point cloud data is optimal when particle shapes 
are not important, such as is the case with point particles. 
Volumetric data is optimal when particle size, shape, or 
orientation are important, such as with systems of rods 
or polyhedra, or when the system is coarse-grained in 
space, such as with phase- separated structures. Image 
processing algorithms [JO] |811 [82] can often be employed 
to change between the two data representations. 



B. Point-Matching or RMS Descriptor 

For relatively simple structures, such as small clus- 
ters or macromolecules, we can use the particle positions 
themselves as a shape descriptor (Fig. Matching 
for this simple scheme is often based on the root-mean- 
square (RMS) difference between points, and thus the 
scheme itself is often referred to as "RMS matching." 
Mathematically, the point matching descriptor S^^^ is 
defined trivially by the pointset {X}: 



:jRMS 



{xi,X2, ...Xn}. 



(4) 



Here, each x^ is a (i-dimensional vector representing the 
position of the ith point in {X}. The S^^^ shape de- 
scriptor [53l [72j [74] is a vector with n x d components. 
Typically the centroid is subtracted off and the vectors in 
{X} are normalized, e.g. by dividing by the average dis- 
tance between points. Point matching schemes were ap- 
plied in early attempts at shape-based database searches 
for macromolecules [53 , and increasingly powerful varia- 
tions of these schemes have since been implemented [5^ . 
Although point matching schemes have the advantage of 
being conceptually simple, there are many subtle draw- 
backs associated with them. First, point matching re- 
quires an assignment step to determine the optimal cor- 
respondence between points in compared structures. The 
coordinates in the shape descriptors are then re-ordered 
accordingly. As a coarse approximation, points can be as- 
signed based on the minimum distance or the maximum 
alignment between individual coordinates. For example, 
a point i on the query structure, can be assigned to the 
point j on the reference structure that maximizes the 
fitness Wij^ defined as, e.g., Wij = fifj\^i • Xj|. This 
scheme has the disadvantage that it is possible to assign 
multiple points on the query structure to a single point 
on the reference structure. A more robust method in- 
volves creating a "fitness matrix" that records the degree 
of correspondence between all pairs of points: 



F = 



( ^1,1 

\ 



^1,2 



,1 w„ 



(5) 



The variables and represent the number of points 
in the query and reference structures, respectively. We 
can then use a numerical technique, such as the Hun- 
garian method [83 , to efficiently determine the optimal 
assignment matrix that maximizes the overall fitness of 
the match. An additional subtlety arises when ^ n^. 
In this case, outliers can be excluded to obtain a "partial 
match" between structures. This is accomplished by se- 
quentially removing points with the lowest total fitness 
defined as Wi = Yl^=i^iJ- The number of points 
excluded depends on the desired application. For par- 



tial matching, we might exclude \nq 



rir 



points from 



whichever structure contains the fewest points. For ex- 
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FIG. 2: Depiction of five different shape descriptors, (a) The RMS descriptor [721174] . Descriptor components are given trivially 
by particle positions or density map. (b) The shape histogram descriptor [75]. The structure is indexed into a histogram 
consisting of rir shells and ne sectors, (c) The D2 shape distribution descriptor [76]. The probability distribution is computed 
for various local measurements, such as the distance or angle between surface points, (d) The Fourier descriptor [771 178] . A 
pattern along the perimeter of the circle or on the surface of a sphere is decomposed into a harmonic representation, (e) The 
Zernike descriptor [791 180] . A pattern on the unit disk or unit ball is decomposed into a harmonic representation. 



eluding outliers, we might exclude all points with Wi be- 
low a certain threshold. 

In addition to requiring assignment, the RMS descrip- 
tor also has the drawbacks that it is sensitive to scale, po- 



sition, and orientation, and structures must first be nor- 
malized and registered unless the orientations are known 
beforehand or the intended application utilizes rotation- 
dependent matching. Depending on the application, ob- 
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jects may be registered based on rigid alignment, or other 
constraints [74 . Rigid registration can be achieved us- 
ing either the iterative closest point (ICP) method [72] , 
which involves minimizing the distance between points on 
compared objects by iterative rotations and translations, 
or the principle components analysis (PC A) method [73] . 
which aligns objects with common principle axes. The 
ICP method has the disadvantage that it is non-trivial 
to implement, computationally expensive for structures 
with many points, and must be performed for all pairs of 
compared shapes. Moreover, it is prone to error if applied 
naively; the ICP method converges to a local minimum, 
so many initial orientations need be attempted to ensure 
convergence to a global minimum. The PCA method is 
only applicable to objects with distinct principle axes and 
thus fails for spherical objects. Despite the simplicity of 
the point-matching shape descriptor, implementation of 
the RMS method can often be non-trivial. Since both as- 
signment and registration are computationally expensive 
(i.e. they scale poorly with n) point matching descriptors 
should be avoided unless (1) n is small, (2) matching is 
required for only a few structures, or (3) registration is 
not required. 



C. Shape Histogram Descriptor 

Another shape descriptor that is conceptually simple 
and has been applied to molecular database searches is 
known as the "shape histogram" [75 (Fig. [^Jd). This de- 
scriptor is based on a density map of the structure on 
a polar or spherical grid. The shape histogram is con- 
structed in 2d by first generating uq equiangular gridlines 
on the unit circle: 



6i = 27ri/no 



0, ... - 1. 



(6) 



The value of uq is chosen so as to capture important 
structural features while balancing computational effi- 
ciency. Structures with radial dependence can be divided 
into Ur concentric shells. A given component in the 2d 
shape histogram descriptor is then given by: 



27r 



(7) 

The S^^ descriptor then contains nQjir real components, 
one for each bin in the histogram: 



:iH2 



qH2 qH2 qH2 \ 

^1 7^2 •'•••^nerir/ 



(8) 



The 3d version of the shape histogram is constructed 
in a similar way, except that in this case, there are many 
different ways to construct the grid. An equiangular grid 
with and uq azimuthal bins and ^n^ polar bins is given 
by: 



9i = iri/no^ 
i = 0, 1, . . .n^ - 1, 



j = 0, l,...,n0 



(9) 



1. 



1^2 



The total number of cells defined by the gridlines is 
The 3d equiangular grid introduces artifacts near the 
poles of the sphere where the cells are small compared to 
the equator. Such artifacts are inherent to 3d grids on the 
sphere; there is no way to create an evenly-spaced grid on 
the sphere with equivalent cells. However, there are sev- 
eral alternatives to the equiangular grid, such as the rec- 
tilinear grid, icosahedral grid, etc., that give more evenly- 
sized cells [84J. A given component in the 3d shape his- 
togram descriptor for an equiangular grid is given by: 
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ne9{yLi) 




n0^(xi) 


'^max 




7T 




27r 



(10) 



As for S , shapes with r-dependence are indexed by 
computing separate angular histograms for each radial 
shell. The S^^ descriptor contains ^n^n^ real compo- 
nents, one for each bin in the histogram: 

sH^ = (5r,5P,...5H^„./,). (11) 

The shape histogram has several advantages over the 
point matching method. First, no assignment step is re- 
quired, since the histogram does not retain information 
about the ordering of the input points. Additionally, the 
grid resolution can be adjusted by modifying uq and/or 
ricfy to provide a desired degree of spatial coarse-graining. 



The shape histogram has the disadvantage that, like the 
point matching method, it requires registration to match 
shapes that are not aligned spatially, unless only radial 
bins are used (i.e., ng = = 1). Shape histograms 
with only radial bins are typically only applicable for 
obtaining coarse measures of similarity, since shape his- 
tograms lose much of their discerning capabilities without 
an angular component. If n is large, the cost of regis- 
tration can be significantly reduced by aligning the his- 
tograms themselves rather than the raw data. Shape his- 
tograms are best suited for describing structures that can 
be broken down into concentric circles or spheres. Exam- 
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pies include nanoparticle clusters, proteins and macro- 
molecules. Shape histograms are also well suited for in- 
dexing global structures with orientational ordering such 
as crystals or quasicrystals, wherein the bond or neigh- 
bor directions of particles create a global pattern on the 



circle or sphere, as described in section III 



D. Shape Distributions 

For many applications, registration is costly and 
rot at ion- invariant descriptors are optimal. A simple yet 
powerful method for creating rotation-invariant descrip- 
tors is given by the "shape distributions" scheme [76] 
(Fig. This scheme involves creating distribution 

functions for simple invariant local metrics. The shape 
distribution "D2" is defined as the probability distribu- 
tion of observing two surface points i and j a distance r 
apart. A given component in the D2 descriptor is given 
by: 



qB2 

Oh. 



Tlr (|Xj 



k 



(12) 



The D2 descriptor is the collection of radial compo- 
nents: 

S^' = {S?\S^\...S^^). (13) 

Notice that this function is similar to the standard radial 
distribution function g{r), except that there is no ideal 
gas normalization and the function is typically computed 
only for points on the surface of the object. 

A similar distribution "A3" is defined by the probabil- 
ity of observing an angle 6 between three surface points: 



ngXj ■ Xfc 



■x,|) 



I 



(14) 



The A3 descriptor is the collection of no components: 



2A3 



_ I qA3 qA3 cA3\ 
— 5 ^2 5 • • • ^ne I ' 



(15) 



Notice that this function is similar to the angular distri- 
bution function a{S). 

Similar distributions can be contrived for sets of four, 
five, etc., points; however, D2 and A3 were shown to 
have the best discerning capabilities for the structures 
tested in reference [76]. Shape distributions are best 
applied to structures with clearly defined, distinguish- 
able surfaces, such as phase-separated structures formed 
by block copolymers [33l |85] or tethered nanoparti- 
cles fT2l 1221^42, 86 . Like gir) and a(l9), shape distribu- 
tions are too coarse to distinguish between similar shapes, 
such as small polyhedral clusters. 



E. Fourier Descriptors 

For shapes with more subtle differences, such as lo- 
calized nanoparticle clusters, macromolecules, or global 



crystal structures, we can apply a more complex but 
more powerful technique for creating rotation invariants 
based on computing the harmonic transform of the shape 
histogram. By disregarding phase information from the 
harmonic transform, we obtain descriptors that are in- 
variant under rotations. The formulae for the harmonic 
transform depend on the underlying basis. Invariants 
can be obtained for shapes on the unit circle [77 (^- 
dependence), sphere [781 [87] ^-dependence), disk |79] 
(r, 6>-dependence) or ball [80] (r, 6>, ^-dependence). On 
the unit circle or sphere, the harmonic descriptors are 
known as Fourier descriptors (Fig.|2]i). On the unit disk 
or ball, the descriptors are known as Zernike descrip- 
tors (Fig. |2^), which we discuss in the following section. 
Additional details regarding the properties and imple- 
mentation of these descriptors for molecular systems are 
provided in a separate reference [64 . 

The Fourier descriptors are based on the Fourier trans- 
form, which involves decomposing a function into a sum 
of harmonic components. The Fourier coefficients for 
a 2d pattern are obtained by computing the discrete 
Fourier transform for each "she ll" g of the 2d shape his- 
togram, S^^, defined in section II C 



cH2 

1 "^sne+j 



(16) 



Here, uq is the number of sectors in each shell s in the 
shape histogram. By considering each shell indepen- 
dently, we reduce a 2d problem (a function of r and 
to Tij. Id problems (functions of d only). The coefficients 
il)^ are complex numbers. 

Although the Fourier coefficients in their complex 
number form are not rotation-invariant (which may be 
beneficial for some applications), they can be converted 
to an invariant form by computing the magnitude of each 
coefficient. The invariant coefficients for a pattern on the 
circle are given by: 



1^,1 = ^,^^; = [5R(^«)' + 9(^^) 



,1/2 



(17) 



The Fourier invariants are positive real numbers. To cre- 
ate a Fourier descriptor for a given shell 5, we take a 
collection of desirable coefficients: 



(18) 



In this equation, we have used invariant coefficients; how- 
ever, rotation-dependent coefficients are useful for many 
applications [6^ JO^ £2^ ,88^ . The coefficients are sensitive 
to patterns with angular frequencies that match the pa- 
rameter i. For example, is large for 4-fold patterns, 
?/^6 is large for 6-fold patterns, etc. Specific coefficients 
can be chosen to describe structures with particular an- 
gular frequencies. In general, an arbitrary pattern can 
always be described by a sufficiently large range of i. 
For the problems that we consider, we typically take t 
in the range imin ^ 2, imax ^ 10- The overall Fourier 
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descriptor is given by concatenating the descriptors for 
each sheh into a vector: 

An analogous scheme can be used for 3d objects, where 



shehs in the shape histogram have [6>, (j)] dependence. The 
Fourier coefficients are obtained by computing the dis- 
crete spherical harmonics transform for each "shell" s of 
the 3d shape histogram, S^^: 



Ene-l v^n^-l gH3 Arm pri 
7=0 2^k=0 ^snen^i^i 



^ 27rfe 



2^j=l Z^fe=l ^snen^ 
+k 



(20) 



Here, N^^ is a normalization factor 
V'(2^ + l)(^-m)!/(^ + m)!, and PJ^ is a Legen- 
dre polynomial [67. The variable m is an integer 
m G [—^^^]. Therefore, unlike the circular coefficients 
ipi^ which are complex numbers, the spherical coeffi- 
cients are vectors with 2^ 1 complex components. 
Rotation-invariant versions of the coefficients can be 
obtained by computing the vector magnitude: 




- \qT?] . (21) 



Like the Fourier invariants on the circle the Fourier 
invariants on the sphere Iq^l are positive real numbers. 
To create a Fourier descriptor for a given shell 5, we take 
a collection of desirable coefficients: 

= (|q£^,^,s|, + • • \^^rr^a^,s\) ' (22) 

Again, we have chosen invariant coefficients, but rota- 
tion dependent coefficients may also be used. The overall 
Fourier descriptor on the unit sphere is given by: 

s^3 = (sr,sr,...s:^y (23) 

Again, different combinations of coefficients can be used 
to create shape descriptors with different levels of robust- 
ness and sensitivity to particular symmetries. 

By using harmonic descriptors we gain many of the 
same advantages of the shape histogram, but without 
the need to register the objects or histograms. Like the 
shape histogram, harmonic descriptors are well suited for 
describing a wide variety of shapes including nanoparti- 
cle clusters, proteins and macromolecules, crystals com- 
posed of arbitrarily shaped particles and, in some cases 
phase separated structures. Harmonic descriptors exhibit 
an inherent data smoothing mechanism; thus they are 
typically better-suited for describing small polygonal or 
polyhedral clusters than the shape histogram, which is 



prone to error without sufficient averaging. These prop- 
erties, along with the unique ability to yield symmetry- 
specific information, have already been successfully ap- 
plied to constructing orientational order parameters for 
small clusters of point particles and simple crystals in the 
context of bond order parameters [5 , 9, 10 , 68 . While 
the bond order parameters scheme focuses primarily on 
the numerical values of specific coefficients (often and 
ge), the shape matching approach more closely resembles 
a signal processing application where we utilize a broad 
spectrum of Fourier coefficients. Additionally, while the 
bond order parameters were defined for point clusters 
that form patterns on the circle or sphere, the descrip- 
tors introduced here can be applied to volumetric objects 
and objects with r-dependence. Notice that in the limit 
of infinitesimal angular bin size and a single radial bin 
{nQ^rifj) ^ oo, = 1), our definitions of ?/^^ and be- 
come nearly equivalent to the bond order parameters, 
only differing by a sign in the complex exponential. This 
only makes the direct mathematical connection between 
harmonic descriptors and the Fourier transform more ex- 
plicit; the change is otherwise inconsequential. We ex- 
plore the properties of Fourier descriptors in more detail 
in reference [64J. 



F. Zernike Descriptors 

The Fourier descriptors introduced in the previous sec- 
tion have pseudo r-dependence. That is, radial informa- 
tion is incorporated by decomposing the structure into 
concentric shells and then computing independent de- 
scriptors for each shell. This is problematic for structures 
with a small number of sample points, such as small clus- 
ters, because random perturbations can move points be- 
tween nearby shells. A second, more subtle drawback oc- 
curs when attempting to distinguish between structures 
for which the shapes of the shells are similar, but the 
relative orientation of the shells within the structure are 
different [78j . Since the descriptors are computed for each 
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shell independently, rotation-invariant descriptors are in- 
sensitive to relative orientations within the structure [78 . 
As a result, in many cases, it is preferable to compute 
harmonic descriptors with full r dependence, known as 
"Zernike descriptors [73 [80]" (Fig. [2^). The coefficients 
of the Zernike expansion, known as "Zernike moments," 
are computed by adding a Zernike radial polynomial to 
the Fourier coefficients: 



The moments are subject to the constraint that i < n 
and {n — i) is even. Each moment is a complex number. 
The rotation invariant Zernike moments on the unit disk 
are given by: 



\(ln£\ — CLn£CL^^. 



(26) 



Kir) 



(n-m)/2 

= E 



{-!)'' {n-ky. 



k=0 



k\ ((n + m)/2 - fc)! ((n - m)/2 - fc)! 



(24) 

Here, r is the radial distance from the origin r G [0, 1], 
and m and n are integers, n > m > 0. The Zernike 
moments on the 2d unit disk are given by: 



n-2k The 2d Zernike invariants are positive real numbers. A 
' Zernike descriptor can be created by concatenating the 
desired Zernike moments into a vector, for example: 



(n + l)E"lo Efc 



1 sr^ne-l qH2 



=0 



(^n£ 



= 1 "^jne+k 



(25) 



XZ2 



J). (27) 



Again, we have chosen invariant coefficients, but rota- 
tion dependent moments may also be used. The Zernike 
moments on the 3d ball are given by: 



3(n + 1) E;Io ' m-o' Er=V' SfZ^^R^e {t) NpP^ 



+1 



-im 



27tI 



Z^j = l Z^fc=l Z-^l=l "^jn^n^ 
+1 



(28) 



Again, we take £ < n and {n — £) is even. Whereas the 
2d Zernike moments ani are complex numbers, the 3d 
Zernike moments z^^ are complex vectors with 2^ + 1 
components. The invariant Zernike moments on the unit 
ball are given by: 



Ke\ = . ^ E I^SP- (29) 

\ m=—i 

The 3d Zernike invariants are positive real numbers. A 3d 
Zernike descriptor is created by concatenating the desired 
moments into a vector, for example: 

= {|zn|, |z2o|, |Z22|, . . . |z«_,«_J> . (30) 

Again, we have chosen invariant moments, but rotation- 
dependent moments and other combinations of frequen- 
cies may also be used depending on the problem. 

Zernike descriptors are best applied to shapes that can- 
not be described by angles alone, such as certain clusters 
of nanoparticles, macromolecules, or complex crystals. 
When computing Zernike moments it is essential that 
the patterns being compared are normalized consistently 
on the unit ball or disk. Typically, normalization is per- 
formed by translating the centroid of the structure to the 
origin and rescaling the coordinates such that every point 



on the pattern has a radial distance less than 1. This 
scheme is sufficient for the majority of patterns that we 
encounter in assembled systems. 

G. Combined Descriptors 

In many cases, we can create new descriptors by tak- 
ing linear combinations of the descriptors outlined above. 
Since the descriptors are represented as vectors, they can 
be concatenated together to combine their properties. 
Descriptors may also be multiplied by a weighting vec- 
tor to (de)emphasize certain components. Other simple 
descriptor operations, such as averaging or taking prob- 
ability distributions can also be useful, particularly for 
describing global structures, as outlined in section |II I| 
below. 

More complex combinations of descriptors can be cre- 
ated for specific applications. For example, one pow- 
erful solution to the problem of "partial matching" is 
given by the "shape contexts" method [89 , which com- 
bines elements of the point matching descriptor with the 
shape histogram descriptor. A separate shape histogram 
is computed for each point in the structure, where the 
coordinate system is centered at that point. The points 
in the query structure are then assigned to their cor- 
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responding reference structure points by optimizing the 
match between shape histograms. Outher points that 
do not correspond weh can be excluded to obtain a par- 
tial match. Another powerful method based on combin- 
ing descriptors is given by the light-field descriptor [90'. 
This descriptor combines information obtained from tak- 
ing several 2D images of a 3D structure at several dif- 
ferent vantage points (typically the 20 vertexes of a do- 
decahedron). Each 2D image is then indexed by an ap- 
propriate shape descriptor, and assignment is performed 
between the collection of images obtained from different 
structures to optimize correspondence. In practice, many 
initial rotations of the reference frame are attempted to 
find a rotation that optimizes correspondence. Although 
the shape contexts and lightfield descriptors are special- 
ized, the method of combining descriptors to optimize 
properties is applicable to a wide range of problems. 



H. Other Possible Descriptors 

The shape descriptors that we have introduced above 
are not meant to represent a complete set, but rather 
representative examples. Any quantitative measure of 
structure can be used as a shape descriptor provided that 
it can be indexed into an n-dimensional vector (or ma- 
trix). Along this line, there are several metrics defined 
in the literature that could fit into the shape matching 
framework and could be used to inspire useful new shape 
descriptors. For example, diffraction patterns, radial dis- 
tribution functions, or orientation tensors (e.g. radius of 
gyration tensor or nematic order tensor [91j) could be 
indexed into shape descriptors that describe global struc- 
tures. Schemes such as the common neighbor analysis 
scheme proposed in reference [7 , could also be readily 
incorporated into this framework to describe local struc- 
tures. Additionally, other structural metrics from the lit- 
erature that individually may not be independently dis- 
tinguishing for a wide range of problems, could still yield 
useful information through linear combination. 



I. Extracting Global Patterns for Shape 
Descriptors 

With the exception of shape distributions, the descrip- 
tors defined in the preceding sections are designed to in- 
dex local structures such as small clusters of atoms or 
nanoparticles, macromolecules, or large but finite micro 
or nanoscale assemblies. Describing global structures is 
more difficult, since local shapes must first be extracted 
from the infinite system and then combined into patterns 
that refiect the "global shape" for indexing. The man- 
ner in which we construct global patterns depends on the 
structural properties of the system. In a rough sense, we 
can group global structures into two different categories: 
structures with long range orientational ordering (00), 
and those without. 



For structures with long-range 00, such as crystals 
and quasicrystals [92], the probability density of neigh- 
boring particles is highly correlated for all particles in 
the system. Thus, an intuitive global shape is given by 
the superposition of all neighbor directions for each local 
structure in the system [10], sometimes called a "bond 
order diagram |93j." This is depicted for the diamond 
structure [2^ in Fig. [3^, top. As detailed in the pre- 
vious sections, this type of pattern is best indexed by 
the shape histogram, or, for rotation-invariant matching, 
the Fourier descriptors or Zernike descriptors. In the 
case that it is important to distinguish between particle 
types, independent global descriptors should be created 
for each type independently, and combined later via con- 
catenation. An example is given for the tetragonal cylin- 
der structure formed from tethered nanospheres [40 in 
Fig. [3^, middle. Global descriptors based on orienta- 
tional ordering are applicable to crystalline structures in 
general, including phase-separated systems arranged in 
crystalline superstructures [40l [Hi. In this case, the 
neighbor directions are computed for the centers of the 
micelles, cylinders, etc. rather than the individual parti- 
cles. 

Non-crystalline globally-ordered phase-separated 
structures such as layered or network structures can 
be approached in a similar way. However, rather than 
creating a descriptor based on the superposition of local 
neighbor directions, a global descriptor is built up based 
on the superposition of local density maps. An example 
is given by the lamellar structure formed by tethered 
nanospheres [86] in Fig. [3|i, bottom. The resulting 
patterns can be indexed by shape histograms, Fourier 
descriptors, Zernike descriptors, etc. in the same way 
as for crystalline long range order. To capture ordering 
on a range of lengthscales, descriptors should be created 
with a radial component that spans the lengthscales of 
interest. 

For systems with no long range ordering such as liq- 
uids, gases and amorphous solids, a different approach 
must be used. Since combining neighbor directions or 
density maps by superposition in non-distinguishing, we 
instead compute the probability distribution of these lo- 
cal patterns. This method is depicted for a dense liq- 
uid [94 in Fig. [sJd. Since this requires a separate de- 
scriptor for every local structure, registration becomes 
computationally prohibitive. Thus, rotation-invariant 
descriptors, such as Fourier descriptors or Zernike de- 
scriptors, are typically optimal. Computing probabil- 
ity distributions is also useful for complex structures re- 
gardless of long range ordering. For example, for the 
double gyroid structure shown in Fig. [sJd, bottom [61], 
the superposition of local density maps may become 
non-distinguishing for the global sample since there are 
many different directions, and probability distributions 
may present a better alternative. As mentioned in sec- 
II D[ complex phase-separated structures can often 



tion 



be distinguished by shape distribution descriptors. How- 
ever, while these descriptors are simple, they yield only 
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FIG. 3: Depiction of strategies for extracting global patterns, (a) Global patterns by superposition. For structures with long 
range orientational ordering, a global pattern can be extracted by translating all local clusters or density maps to a common 
origin. (6) For structures with no long range orientational ordering or complex structures with many important directions, a 
global pattern can be built up from the probability distribution of local patterns. 



a coarse measure of the shape, and thus can be non- 
distinguishing for similar structures. 



and Si • Sj, are defined. The similarity metric based on 
the Euclidean distance is given by: 



III. SIMILARITY METRICS 

The degree to which two shape descriptors match [95] 
is quantified by a scalar similarity metric M(Si,Sj). 
Since shape descriptors are vectors by construction, stan- 
dard vector operations such as the Euclidean distance 
or vector projection provide natural similarity metrics. 
Along this line, two standard similarity metrics, — Sj 



1/2 



(31) 



Here, k is one component of the shape vector S, which 
may be a real or complex number. Similarly, a similarity 
metric based on the projection of one shape descriptor 
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vector onto another is defined by 



-,1/2 



(32) 



For the sake of comparison, it is useful to define the 
similarity metrics on the interval M G [0,1], with 1(0) 
giving the maximum (minimum) match. Thus, we rede- 
fine the Euclidean distance similarity metric as: 

Md^st{S,, Sj) = 1 - [(S, - S,) / (|S,| + |S, I)] . (33) 

Similarity, we redefine the projection-based similarity 
metric as: 

Md^tiSi, S,-) = i [1 + (Si • S,-) / (|Si||S,-|)] . (34) 

The modified similarity metrics also have simple geo- 
metric interpretations. The Mdist function is the ratio of 
the Euclidean distance between vectors and the maximal 
distance between the vectors (i.e. if the vectors are an- 
tiparallel). The M^ot function is proportional to the de- 
gree of spatial alignment between descriptor vectors. If 

and Sj are parallel, Mdot has a value of 1. If and Sj 
are antiparallel, M^ot has a value of 0. After normaliza- 
tion, the only difference between similarity metrics is the 
proportional weight given to the two types of differences. 
Matching functions based on projection are sensitive to 
differences in the signs of components, whereas distance- 
based metrics are only sensitive to do not magnitude of 
differences regardless of the sign. 

In addition to these metrics, we can define a wide va- 
riety of other metrics that are sensitive to particular dif- 
ferences in shape descriptors. For existing metrics, dif- 
ferences or correlations can be dampened or accentuated 
by applying an arbitrary power p to the component- wise 
comparison. In some cases, highly specialized similarity 
metrics can be applied to specific descriptors. An ex- 
ample of a specialized matching scheme is given by the 
quadratic metric of the shape histogram method of refer- 
ence [76 , which takes into account neighboring histogram 
bins when computing differences. 



IV. ALGORITHMS AND EXAMPLES 

The shape descriptors and similarity metrics described 
in the previous sections can be used to create various 
types of order parameters, correlation functions, and 
other structural metrics. In this section, we describe 
general algorithms that, when used with the appropriate 
shape descriptors and similarity metrics, can be applied 
to characterizing structure for a wide range of particle 
systems. In some cases, the algorithms are reformulated 
versions of standard schemes from the condensed matter 
literature. Additionally, we explore algorithms from the 
shape matching literature that have not yet been widely 
applied to particle systems, and present some completely 



new algorithms that exemplify the future direction of the 
framework. For all of the algorithms, we provide repre- 
sentative example problems to demonstrate their appli- 
cation. Our examples are mostly drawn from the self- 
assembly literature; however, in some cases we explore 
more idealized problems from the condensed matter lit- 
erature for simplicity. Since our goal is to present impor- 
tant elements of the shape matching framework rather 
than solve specific problems, the examples should be con- 
sidered proofs-of-concept rather than optimal solutions. 



A. Simple Structure Identification 

The goal in most computer science shape matching ap- 
plications is to identify unknown structures by search- 
ing a database of known reference structures. Structures 
are identified by the known structure that gives the best 
match. This type of scheme has already been applied 
to particle systems in the context of fast shape-based 
database searches for proteins and macromolecules [53]- 
59 . The algorithm for structure identification is given in 
pseudocode below. 

set match_best = 

set id_best = 'none' 

call compute_shape_descriptor (S_i) 

for each structure j in ref erence_database 
call compute_shape_descriptor (S_j ) 
set match = M(S_i, S_j) 
if match > match_best 
match_best = match 
set id_best = id_j 
end if 
end for 

return id_best 

Although structure identification schemes based on 
database searches have been applied in limited cases for 
condensed matter systems [60| |61j , many standard struc- 
ture identification schemes bear strong resemblance to 
this algorithm. For example, the common neighbor anal- 
ysis (CNA) scheme of reference [7 involves identifying 
local clusters by matching their numerical fingerprints, 
based on their distribution of local neighbor configura- 
tions, with those for predetermined ideal structures. In 
a rough sense, the CNA fingerprints can be considered 
shape descriptors for a given cluster, and the ideal fin- 
gerprints can be considered a database of reference struc- 
tures. A similar structural identification scheme is based 
on the bond order parameters of reference [TO^. In this 
scheme, local structures are identified by choosing cutoff 
values for various bond order parameters, beyond which 
query structures are said to match an ideal structure with 
a known high value of the order parameter [96 . In this 
case, the bond order parameters represent shape descrip- 
tors, the cutoffs act as similarity metrics, and the ideal 
structures used to set the cutoffs act as the reference 
database. 

As a minimal example of a structure identification 
scheme, consider the problem of identifying the small, 
imperfect cluster in Fig. [4k, where particles have been 
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FIG. 4: Identification of local structures (a) Basic identifica- 
tion of a slightly imperfect fee cluster. The table shows the 
matching values for the query structure compared to fee, hep 
and icosahedral reference clusters. (^) A fee crystal with hep 
stacking faults. The particles are colored based on their first 
neighbor shell configuration. Light (yellow) particles are in 
the fee configuration, while dark (blue) particles are in the 
hep configuration. 



slightly perturbed from their ideal face-centered-cubic 
(fee) positions. For the purpose of the present example, 
we consider a small library of reference structures consist- 
ing of fee, hexagonal-close-packed (hep) and icosahedral 
clusters, each with 13 atoms. To differentiate between 
these clusters, we use rotation invariant Fourier descrip- 
tors on the surface of the sphere S^^, where I = 4,6. 
These coefficients are chosen because they are the leading 
coefficients for this class of structures [10 . As shown in 
the table in Fig. [4^, the unknown structure best matches 
with the fee cluster, followed by the hep and icosahedral 
clusters, thus identifying the structure as fee. 

This simple matching scheme can be performed repeat- 
edly to identify local structures in a global sample. Con- 
sider the defective fee crystal shown in Fig. [4]3, which 
contains hep stacking faults [97 . The stacking faults can 
be identified by finding particles in local hep configu- 
rations rather than fee. First, local structural patterns 
must be created for each particle. This is done by cluster- 
ing all neighboring particles within a cutoff radius Tcut- 
Here, the cutoff is chosen to encompass the first peak in 
the radial distribution function g{r); first neighbors can 
alternatively be found by the Voronoi construction [^8 . 
Since we know a priori that ideal fee and hep clusters are 
the only possible structures, our reference library consists 
of these two structures exclusively. Particles are identi- 
fied by finding a best match, as in the previous example, 
and colored based on their local configuration (light cor- 
responds to fee, dark to hep) in Fig. [4]3, highlighting the 
stacking faults. 

Structure identification can also be performed for 
global samples. Global structure identification can be 
useful, e.g., when mapping structural phase diagrams 
(see section 
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FIG. 5: Identification of global crystalline structures for a sys- 
tem of mono-tethered nanospheres that aggregate into spher- 
ical micelles, (a) Bulk micelle structure. (6) The micelle 
centers-of-mass are extracted using a Gaussian filter, (c) The 
global pattern is created by superposition of the local pat- 
terns^ (d) Matching is based on a Fourier descriptor (sec- 
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II E[ ) that ind exes the global superposition of local pat- 
and identifies the micelles as bcc struc- 



( section 



III) 



system similar to references [22l |99] , whose tethers phase 
separate into spherical micelles (see Fig. The mi- 

celles themselves pack into an ordered crystalline super- 
structure. The structure of the crystals can be deter- 
mined by identifying the micelle centers of mass, which 
make up the set of positions {X} that describe the sys- 
tem (see Fig. [sJd) . The centers of mass are determined by 
applying a Gaussian filtering algorithm adapted from the 
colloidal science literature [81] |82] . A global crystalline 
pattern is determined by computing the superposition of 
local patterns (see Fig.js]^, similar to Fig.jS^i). The global 
pattern is then compared to that for several standard 
candidate crystals, by matching Fourier descriptors for 



identification, consider the mono-tethered nanosphere 



patterns on the surface of the sphere: Mdot{^^^ ^^ref)- 
Here, the Fourier descriptor is composed of the leading 
terms in the harmonic expansion for the standard crys- 
tals: S^^ =< |q4|, Iqgl, Iqgl, Iq^ol, |qi2l >• Notice that we 
use invariant Fourier coefficients for rotation-independent 
matching. The unknown crystal is identified by the ref- 
erence structure that gives the best match, in this case 
bcc (see Fig. |5]i). 

The structure identification applications presented in 
this section are successful because the potential refer- 
ence structures are known a priori. However, the iden- 
tification schemes fail if the unknown structure is not 
in the reference database. It is therefore important to 
carefully choose the appropriate reference structures for 
a given application. Often, optimal matches are ob- 
tained by using imperfect structures from the system 



rather than mathematicany perfect structures for refer- 
ence structures. As an added consideration, it is some- 
times possible to obtain partial structures that are highly 
ordered, but are missing one or more particles. This is 
an important factor in phase separated systems, systems 
with physical boundaries, and systems with high vari- 
ation in neighbor distances. For proper identification, 
partial structures must be added to the reference library 
explicitly [60], unless a shape descriptor capable of par- 
tial matching is used, such as a point matching descriptor 
or shape contexts [89^ . 
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B. Identification Without Reference Structures 

As the number of potential structures grows, compil- 
ing a comprehensive reference library becomes increas- 
ingly difficult. However, if the space of potential refer- 
ence structures is finite, a reference library can be created 
on-the-fly (OTF), precluding the need to define reference 
structures a priori. To do so, a new reference structure 
is added to the library whenever no suitable match is 
found. The algorithm for identification without known 
reference structures is given in pseudocode below. 

set match_best = 

set id_best = 'none' 

call compute_shape_descriptor (S_i) 

for each structure j in ref erence_database 
call compute_shape_descriptor (S_j ) 
set match = M(S_i, S_j) 
if match > match_best 

match_best = match 

set id_best = id_j 
end if 

if match_best < match_min 

call add_structure_to_database (S_i , counter) 

set id_best = counter 

set counter = counter + 1 
end if 
end for 

return id_best 

Notice that the algorithm requires an additional step 
where the reference structures themselves, which are ini- 
tially "unnamed," are identified. This can be accom- 
plished by using a standard identification algorithm sim- 
ilar to that outlined in section IV A[ or in some cases 
more simply by visual inspection. 

The OTF algorithm is applicable to simulations or 
algorithms that involve enumerating unique structures. 
One example is given by Bottom-Up Building Block As- 
sembly (BUBBA) [43 . The BUBBA algorithm efficiently 
generates low-energy clusters by trying different combi- 
nations of smaller low-energy clusters. To ensure that 
the clusters generated are not redundant, an OTF shape 
matching scheme is employed (see Fig. [6|. New clusters 
are added to a reference library if no match is found, 
while redundant clusters are ignored. In the end, the 
reference library contains a list of unique clusters. This 
type of scheme may also potentially be applied to infor- 
mation compression for mapping structural phase spaces. 
Since large portions of the parameter space are often re- 
dundant for high-resolution mappings, an OTF scheme 



FIG. 6: On-the-fly reference library with Bottom Up Build- 
ing Block Assembly (BUBBA) [43 . The BUBBA algorithm 
involves enumerating unique clusters of a given size N. To 
ensure that clusters are unique, new clusters are added to the 
reference library and given a unique identifier, while repeated 
clusters (yellow) are discarded. 



can be employed to quickly obtain a minimal number of 
unique structures. 



C. Identification in Systems with Disordered 
Structures 



Creating a comprehensive reference library is nearly 
impossible when local structures can assume disordered 
configurations. In this case, the space of potential refer- 
ence structures is essentially infinite, since "disordered" 
refers to the vast space of configurations with no par- 
ticular structure. As a solution, a structure that does 
not match any structure in the reference library within a 
certain threshold is considered "disordered [60 ." This 
requires that we choose a cutoff value for a best match. 
The cutoff must be chosen carefully; in thermal systems, 
an overly-stringent cutoff might cause a matching scheme 
to miss highly-ordered structures perturbed slightly from 
their ideal configurations, whereas an overly-permissive 
cutoff can misidentify highly disordered structures. In 
most cases, a sufficiently rigorous cutoff can be defined 
such that its value does not affect the qualitative results. 
The algorithm for structure identification with disordered 
local structures is given in pseudocode below: 

set match_best = 

set id_best = 'none' 

call compute_shape_descriptor (S_i) 

for each structure j in ref erence_database 
call compute_shape_descriptor (S_j ) 
set match = M(S_i, S_j) 
if match > match_best 

match_best = match 

set id_best = id_j 
end if 

if match_best < disordered_cut 

id_best = 'disordered' 
end if 
end for 

return id_best 
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Pressure 

FIG. 7: Pentagonal dipyramids (PDs) in the hard tetrahedron 
system 19 . (a) A snapshot of the hard tetrahedron liquid at 
packing density 0.5 and reduced pressure P = 60. (b) A 
PD-like cluster taken from the system. The arrows depict the 
pattern of directions [0, 0] on the surface of the sphere indexed 
for matching, (c) The number of PDs as a function of the 
identification cutoff value. Notice that for all cutoffs, there is 
an inflection point centered at P = 58, which corresponds to a 
possible liquid-liquid transition marked by a sudden increase 
in PD-like local ordering. 



As an example, consider the hard tetrahedron fluid 
studied in reference (Fig- [T^)- In this system, an 

important local motif, originally identified by visual in- 
spection, is the "pentagonal dipyramid" (PD), formed by 
five tetrahedra sharing a common edge. The PDs form 
a spanning network as the system goes through a liquid- 
liquid transition. To identify PDs, we first cluster all sets 
of five tetrahedra in the system that form a closed poly- 
gon. The shape of each cluster is defined by projecting 
the directions of the tetrahedra on the surface of a sphere 
(Fig. [tJ)). An ideal PD gives a 2D pentagonal pattern, 
which is taken as a reference structure. Although the 
pentagon reference structure is confined to a plane, the 
query structures are not. Therefore, this pattern is well- 
described by Fourier descriptors on the surface of the 
sphere, with matching given by Mdist{S^uery^ ^ILtagon)- 
We take rotation-invariant descriptors with frequency pa- 
rameter i = 5,6, ...10. For some systems, there is a 
clear distinction between ordered and disordered struc- 
tures. However, for the tetrahedron system we observe 
a continuous spectrum of PD-like ordering. Thus, we es- 
timate a cutoff based on visual inspection in the range 
Mcut ~ 0.9. Although this choice is arbitrary, it has little 
effect on structural trends for the system. Fig. ^ shows 
the number of PDs for a wide range of cutoffs. We see 



that for all cutoffs, the fraction of PDs exhibits a weak 
crossover, marked by an inflection point, near reduced 
pressure P = 58. This pressure, in turn, corresponds 
to an interesting thermodynamic transition for the sys- 
tem [19 . Although cutoffs result in different numbers of 
PDs, the same underlying physical behavior is captured 
regardless. 

D. Order Parameters and Temporal Correlation 
Functions 

Another standard application of structural metrics is 
to track structural transitions, either as a function of 
time or a changing reaction coordinate. This is typically 
accomplished by monitoring either an order parameter 
or correlation function as the system changes. In the 
context of the shape matching framework, the difference 
between the two cases is largely semantic; while an or- 
der parameter typically measures similarity with an ideal 
structure, a correlation function typically measures sim- 
ilarity between different structures in the system, sepa- 
rated in time and/or space. The simple algorithm for 
tracking a transition as a function of a changing param- 
eter is given below: 

call compute_shape_descr iptor (S_ref ) 
for p in changing_parameter 

call compute_shape_descriptor (S_p) 

set order_param [p] = M(S_p, S_ref) 
end for 

return order_param 

Here the similarity metric M G [0, 1] serves as a conve- 
nient order parameter. Tracking structural transitions 
is important for a wide variety of applications, includ- 
ing elucidating thermodynamic transitions [6l I1QQH1Q3| 
and assembly pathways gOl EH [lOl [105]. Many of the 
advanced molecular simulation techniques used to study 
transitions |1Q6H11Q] rely on structural metrics in the 
context of pseudo-reaction coordinates [106 , biasing pa- 
rameters |lQ9j . and collective variables [108 to guide the 
statistical sampling algorithm. Standard order parame- 
ters have been devised for various types of ordering, in- 
cluding bond orientational ordering [9, 10, 68, lU], liq- 
uid crystalline ordering [4l lll2j such as nematic [113 and 
smectic ^lOlj phases, chiral ordering [8 , and helical or- 
dering jll4j . Time correlation functions based on these 
order parameters have been applied to creating struc- 
tural autocorrelation, or "memory" functions for glassy 
liquids |115( I116j and growing quasicrystals [62 . 

As a simple example of creating an order parame- 
ter within the shape matching framework, consider the 
sheet-like structure self-assembled from laterally tethered 
nano-rods studied in reference [4T, and shown in Fig.jS^i. 
Due to an instability, the initial sheet relaxes into a he- 
lical structure that minimizes the free energy. We can 
track this structural transition by matching the shape of 
the sheet at a given time t with the final, fully equili- 
brated helical structure: M {St helix)' Since the struc- 
ture is 3-dimensional and has radial dependence, it can 



16 




1 2 3 

t (MD timesteps X 10^) 



FIG. 8: Assembly of a helical sheet composed of laterally 
tethered nano-rods ^44]. As time progresses, the initially 
flat sheet twists into a helix. The matching order parame- 
ter Mdist{St,Sheiix) compares the structure at time t with 
the shape of the final ideal helical structure. 



be indexed using a Zernike descriptor on the unit ball, 
S^^. Since the sheet only changes in terms of its twist 
in space, we save computational effort by only consider- 
ing points along the backbone of the sheet. To match 
the shape independently of the orientation of the sheet, 
we take rotation invariant moments with £ in the range 
4 < i < 12. Fig shows the helical order parame- 
ter as a function of time for a long molecular dynamics 
run. We observe that over tens of millions of MD steps, 
the sheet slowly and continuously equilibrates to the fi- 
nal helical structure, in agreement with visual inspection. 
Our matching order parameter gives a better indication 
of the structural transition than the more standard heli- 
cal order parameter H4 [114 , which is rather insensitive 
when the pitch of the helix is large compared to the ra- 
dius [44]. The noise in the data at long t is indicative 
of the relatively large fluctuations in shape that occur in 
equilibrium. 

As a slightly more complex example, consider the 
structures formed by the ditethered nanospheres shown 
in Fig. ^ |4j. The system goes through two 
transitions as a function of inverse temperature or 
quench depth, first from a disordered structure to a 
phase-separated structure characterized as a tetrago- 
nal cylinder/tetragonal-mesh (TC/TM), and then to a 
similar structure characterized by tetragonal cylinders 
(TC/TC) [41]. The abbreviations indicate the patterns 
formed by the tethers and nanoparticles, respectively. 
To obtain a quantitative measure of this behavior, we 
take three reference points: ideal snapshots from the 
disordered regime, the TC/TM regime and the TC/TC 
regime. As outlined in section several different de- 
scriptors are applicable to this type of global structure. 
Here, we use the shape distribution S^^, since it is dis- 
tinguishing between the three compared structures. Sep- 
arate S^^ descriptors are created for each of the three 
aggregating species. These descriptors are then concate- 
nated into an overall descriptor. Rather than considering 
surface particles exclusively, we use all particle positions, 
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FIG. 9: Structural transitions in a phase separated sys- 
tem, (a) Visual depiction of the three structures formed 
by a ditethered nanosphere system 41 (left to right: disor- 
dered, TC/TM, TC/TC). (b) Matching order parameter for 
the three reference structures as a function of inverse temper- 
ature. 



since this is simpler and is still distinguishing for the 
cylindrical phases under consideration. Fig. [oJd shows 
how the character of the system changes as a function 
of inverse temperature. We see that the structural tran- 
sition between the three phases is smooth and continu- 
ous, as verified by visual inspection. In a separate ref- 
erence 164], we show that a scheme based on the distri- 
bution of Fourier descriptors for local density maps gives 
identical results. 

As a relatively complex example, consider the gold 
nanowire undergoing tensile elongation shown in Fig. 10 i 
[46l[47 . As the wire elongates, a "neck" begins to form, 
in this case, at ~10 A. This type of structural tran- 
sition can strongly impact the transport properties of 
nanowires [117 and thus is important to identify. The 
neck region can be characterized by the loss of the orig- 
inal fee structure locally, e.g. a change in orientation, 
number of neighbors, or overall symmetry. Global crys- 
talline order parameters [10 may not be well suited, since 
only a subset of the system undergoes a transition. Stan- 
dard schemes that differentiate between crystal and liq- 
uid configurations locally, such as • [6 (see sec- 
tion 



IV E), are not well suited either, since the finite na- 
ture of the nanowire results in neighboring atoms having 
different local coordinations, even in the ideal fee config- 
uration (i.e. a mixture of full and partially coordinated 
fee clusters). Instead, to detect the onset of necking, we 
compare an atom's neighbor shell to its initial structure 
as a function of elongation L: M(Sl,Sl=o)- Local neigh- 
bor shells are indexed using rotation-dependent Fourier 
descriptors S^^ = qg, where £ = 6 is chosen because it 
has been shown to describe fee clusters well without re- 
quiring other frequencies ^ |88j . The number of atoms in 
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FIG. 10: Neck formation in a gold nanowire [46f Tfl. (a) 
depiction of the nanowire as a function of elongation L. 
(b) The standard deviation of matching values, A = 1 — 
(^Mdot{^^^ ,^^=o)) ^^^^^ as a function of elongation. 



the neck is small compared to the bulk, thus the average 
autocorrelation value is not strongly sensitive to neck for- 
mation. However, the spread in the data is sensitive to 
neck formation, and rapidly increases when atoms in the 
neck lose their original structure and yield low matching 
values. We can therefore create an ad hoc order parame- 
ter based on A = l-{Mdot{SP, ^[1^))^^^^^, where A = 1 
for the ideal configuration at L = 0, and decreases pro- 
portionally as the spread in matching values increases. 
Fig. [ToJd shows that the onset of neck formation occurs 
at 9.7 A, which is consistent with visual inspection. 



E. Spatial Correlation Functions 

In addition to characterizing how structures change as 
a function of time or a reaction coordinate, another com- 
mon application of structural metrics is to characterize 
how structures change in space. In the context of the 
shape matching framework, this involves choosing struc- 
tures from different points in the system, rather than 
ideal structures, as reference structures. Spatial correla- 
tion functions are often used to measure structural "cor- 
relation lengths." The algorithm for computing struc- 
tural correlation lengths within the matching framework 
is given in pseudocode below: 

for i in list_of _local_structures 



call compute_shape_descriptor (S_i) 
for j in list_of _local_structures 

call compute_shape_descriptor (S_ j ) 

set r = distanced, j) 

set correlation_f unction(r) += M(S_i, S_j) 
set normal izat ion (r) = normal izat ion (r) + 1 

end for 

return correlation_f unction / normalization 



In the condensed matter literature, structural correlation 
functions have been defined for crystal-like ordering in 
2d [9l EH], and 3d |6| |10^, nematic ordering pJLSj, and 
many other more specialized types of ordering. Other 
types of spatial correlation functions have been widely 
applied as well. One example is the • scheme of 
references [6l [88], which detects ordered crystal nuclei 
based on spatial correlations between local bond order 
parameters. 

As a simple example of creating a spatial correlation 
function within the shape matching framework, consider 
the problem of characterizing the formation of lamel- 
lae (sheets) in the system depicted in Fig. pj!|i , com- 
posed of tethered V-shaped nanoparticles (Fig.|ll|3) j42] . 
From visual inspection, it is clear that the nanoparticles 
have long-range orientational correlations in the lamellar 
phase, but not in the disordered phase. This can be quan- 
tified by computing an orientational correlation function 
for nanoparticles as a function of separation distance r. 
It may initially appear that the nematic descriptor can be 
applied to this problem; however, since the nanoparticles 
have two directors, we lose important information about 
particle packing by considering only one angle. Rather, 
an optimal metric reflects the correspondence between 
both directors of the nanoparticles. This can be mea- 
sured by a scheme based on the RMS descriptor, where 
the datapoints {X} for each nanoparticle are given by 
the two directors, pointing from the vertex to each 
of the endpoints xi, X2: {X} = {xi — x^,X2 — x^}. 
Assignment of corresponding vectors is performed us- 
ing the "naive" method (see section [lIB ), which is ex- 
act for this particular problem. Fig. |11|3 shows the 
average value of the orientational correlation function 

(^MdotiS^^^, Sf ^'^(r)^ as a function of the radial sepa- 
ration r for several different snapshots. In the disordered 
phase, only very short range correlations are present and 

{^M^ot(Sf^^,Sf^^(r)^ is small for all r. The correla- 
tions quickly grow as the system begins to form sheets, 
and the range of ^Mrfot(Sf Sf ^^(r)^ increases. In 

the final state, the lengthscale is infinite, spanning the 
length of the simulation cell. 

As a slightly more complex example, consider the prob- 
lem of measuring time-dependent structural correlations 
in the 2D binary mixture shown in Fig. [l2^. The system 
consists of a 50:50 mixture of spherical particles with a 
diameter ratio 1.4:1, and can represent either a model 
supercooled liquid |119[ 1120] or a granular system near 
the onset of jamming |12H 



I122j . The system contains 
small hexagonal crystal (hex) grains arranged randomly 
within the disordered bulk liquid. To measure the ef- 
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FIG. 11: Spatial correlations in a system of tethered nano "V's" [42| . (a) Depiction of the formation of a lamellar phase as 
the system evolves in time on cooling. (6) Depiction of the coarse-grained nanoparticle model, (c) Nanoparticle orientational 
correlations as a function of separation distance r. 
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FIG. 12: Spatial and temporal correlations in hexagonal clus- 
ters, a) Depiction of the 2d hexagonal grains identified in 
the system. Each grain is given a different color while liquid- 
like particles are colored black, b) Time-dependent structural 
decorrelation function for the hexagonal grains and the overall 
liquid. Notice that the hexagonal grains retain their structure 
much longer than the overall liquid. 



feet of hex stmeture on dynamies, we can compare the 
rate of structural decorrelation in the hex and non-hex 
regions. To do so, we require two correlation functions: 
first, a spatial correlation function to identify the hex 
regions and, second, a temporal correlation function to 
quantify how closely a given structure "remembers" its 
initial configuration as a function of time. 

Identifying the hex grains within the bulk liquid re- 
quires a structural criterion that differentiates between 
particles in the liquid and hex regions on a per-particle 
basis. As mentioned previously, the qg • scheme 
of reference [6 can be used to monitor nucleation and 
growth in 3d systems that form fee, bcc, and hep 
crystals [88l [971 fT23HT26] . Although the scheme was 
originally based on the i = 6 Fourier coefficient q^, 
other shape descriptors can be just as easily be substi- 
tuted [29\ [62] . The main physical insight underlying the 
qg • qg scheme is that crystals often contain local particle 
configurations that match with their neighbors in terms 
of both shape and orientation. Therefore, crystal-like par- 
ticles can be identified by detecting those that match well 
with their neighbors. In analogy with reference [6 , a 
good indicator of local crystal-like local ordering is the 
fraction of solid-like matches with neighbors: 

n 

fsoHd = l/n^e[M^o,(S„S,) - Me,,]. (35) 
j 

Here is the Heaviside function and "j" is a neighbor of 
"z," and S is a lotdition- dependent shape descriptor. Par- 
ticles with a minimal fraction of solid-like matches fcut 
are considered to be locally crystalline. The cutoffs can 
be determined by viewing plots of the P{M(iot{^i^^j)) 
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and P{f solid) distributions for the bulk liquid and bulk 
solid [H [88], or simply by visual inspection, as we per- 
form here. This scheme holds for crystals in general, 
provided the neighbor shells all have the same shape. In 
reference [64 , we describe how this scheme can be mod- 
ified to handle crystals with an assortment of neighbor 
shells. The algorithm for detecting crystal grains is given 
in pseudocode below: 

for i in list_of .particles 

set S[i] = compute_neighbors (i , rcut) 

call compute_rotation_dependent_shape_descriptor (S [i] ) 
end for 

for i in list_of .particles 
set n_solid = 
set count = 
for j within rcut of i 

if M(S[i] , S[j]) > M_cut 
n_solid = n_solid + 1 
end if 

count = count + 1 
end for 

if n_solid / count > f_cut 

append(list_of _solid_particles , i) 
end if 
end for 

For our example, we identify hex grains using the S^^^ 
descriptor with Mcut = 0.99 and fsoUd = 0-5. Notice 
that this scheme does not simply detect local hexagons; 
rather it identifies local hex regions while making the 
important distinction between isolated hexagons and the 
intermediate-range hex clusters of interest. Our tempo- 
ral correlation function is defined by matching the point- 
matching descriptor for each cluster at time t with itself 
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at a reference time to: Mdot[^^ {t), S^'''^{tQ)]. Yig 

shows the correlation function for hexagonal and non- 
hexagonal particles. We see that the hexagonal parti- 
cles retain their structure longer on average than non- 
hexagonal particles. Similar correlation functions have 
been applied to study glassy dynamics |115j and to study 
how the structure of liquid clusters change as they attach 
to a growing quasicrystal [62] . 

The general ideas underlying the correlation functions 
outlined above can be applied to more abstract problems 
as well. For example, rather than creating correlation 
functions in space and time, we can create correlation 
functions in parameter space. Consider the problem of 
automatically generating the structural phase diagram 
for the 2D Lennard- Jones-Gauss (LJG) system |127j 
shown in Fig. [13^ from a collection of simulation snap- 
shots for each statepoint. The phase diagram was gener- 
ated by visual inspection of over 5000 statepoints [127] . 
We can automate the creation of this phase diagram by 
using an idea similar to the qg-qg scheme outlined above. 
In this case, rather than finding structural correlations 
between neighboring particles in real-space, we can cal- 
culate correlations between neighboring snapshots in pa- 
rameter space. For each point on the structural phase di- 
agram, we compute = M(Si, Sj), where "z" and 
"j" are neighbors in parameter space, and S is a global 
shape descriptor. Here, we take the global descriptor as 
a combination of a global superposition descriptor, in- 
dexed by a shape histogram with no = 20, = 1, and a 
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FIG. 13: Structural phase diagram for the 2d Lennard- Jones 
Gauss system [127] . (a) Structural phase diagram created by 
visual inspection |127j . (6) Structural phase diagram created 
by shape matching. Each pixel in parameter space is given 
an intensity based on the average match with structures for 
neighboring points. 



global probability distributions descriptor based on local 
Fourier descriptors with frequency range £ = 6, 7, ...11: 



obal 



= s 



;.H2 

^ gloh 



ahPi^focai))' Points in stable regions 
of parameter space match well with neighboring points 
and have a high value of /(i), whereas transitional points 
match poorly and have a low value of The algo- 

rithm for creating a visual map of the structural phase 
boundaries for a parameter space is given in pseudocode 
below: 

for each point i in parameter space 

call compute_shape_descriptor (S [i] ) 
end for 



for each point i in parameter space 
set pixel_intensity [i] = 
for each neighboring point j 

set pixel_intensity [i] += M(S[i]j 
end for 
end for 



S[j]) 



The transitional points map out structural phase 
boundaries that look very similar to the diagram created 
by visual inspection. Notice that the global superposi- 
tion descriptor detects no difference between the hexag- 
onal and honeycomb crystals, since both structures are 
six- fold symmetric, and thus yield equivalent combined 
shape histograms. Additionally, the superposition de- 
scriptor picks up a slight artificial "boundary" within the 
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hexagonal region near tq ~ 1.15, where the hexagonal 
crystal begins to form multiple grains rather than a sin- 
gle crystal, resulting in different shape histograms. The 
probability distributions descriptor, on the other hand, 
gives no distinction between the pentagonal and decago- 
nal phases, since they are nearly identical locally, differ- 
ing only in long range ordering. Overall, both sets of 
global information taken in combination are necessary to 
correctly find all of the phase boundaries for this partic- 
ular application. However, in many cases, the space of 
structures is sufficiently non-degenenate to be described 
by a single method. 



F. Heat Maps and Grouping 

Another common application of shape matching tech- 
niques is to the problem of visually grouping or classify- 
ing similar structures [128^ . Grouping objects based on 
shape similarity has also been applied recently to macro- 
molecules and proteins [56^, 'SS^. Grouping can be ac- 
complished by plotting the matrix of pairwise matching 
values known as a "similarity matrix" or "heat map." 
The algorithm for computing a similarity matrix is given 
in pseudocode below: 

for i in list_of _local_structures 

call compute_shape_descriptor (S_i) 
for j in list_of _local_structures 

call compute_shape_descriptor (S_ j ) 

set matrix[i,j] = M(S_i, S_j) 

end for 
return matrix 

Objects can be grouped or classified based on features 
of the plot. As an example, consider the TIP4P water 
clusters 
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shown in Fig. 
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The matrix shows the 



match values obtained for minimum energy clusters of 
sizes N = 2 — 21, which are available from the Cambridge 
Cluster Database [65]. The clusters are indexed using 
rotation-invariant Zernike descriptors with frequency pa- 
rameters ^ = 4, 5, ..12. 

The patterns displayed in the heat map require some 
interpretation. The high correlation along the diagonal 
i = j is common to all heat maps, and simply indicates 
that structures match perfectly with themselves. The re- 
gion = 12 — 15 displays a bright box, which indicates 
a group of structures that all match well with one an- 
other. The region A" = 7 — 10 displays a checkerboard 
pattern, which indicates that every-other cluster matches 
well. Cluster A' = 16, due to its unique non-compact na- 
ture, matches poorly with all other clusters, as indicated 
by the dark (purple) cross at A" = 16. In addition to 
grouping and classifying objects, heat maps can be used 
to visually indicate convergence with respect to a chang- 
ing parameter. Multiple heat maps based on different 
descriptors can be constructed for the same set of struc- 
tures to show the similarities on different levels of order- 
ing. Although we consider local clusters for our example, 
heat maps can be applied to any structures that can be 
indexed by shape descriptors, including global structures. 




FIG. 14: Grouping and classifying structures based on shape 
similarity. The plot shows a similarity matrix for energy- 
minimized TIP4P water clusters [48] . The matrix simultane- 
ously shows the pairwise matching values for all clusters. 



V. FUTURE OUTLOOK 

In summary, we have introduced a shape matching 
framework for creating new structural metrics for com- 
plex patterns, such as those encountered in self-assembly. 
All of the methods and examples outlined here are ac- 
cessible online through our C/C++ shape matching li- 
brary [49 . Although our examples and discussions here 
are geared towards self-assembly and condensed matter 
physics, the general ideas underlying the shape matching 
framework are widely applicable to systems with complex 
structures, such as those encountered in computational 
biology [64]. In the future, new shape descriptors and 
algorithms can be added to the framework to expand its 
scope to different classes of structures and problems. 

The example applications and shape descriptors that 
we have presented here represent only a small subset of 
the vast range of possibilities yet to be explored. One 
obvious area for future study is to test the applicabil- 
ity of the wealth of shape descriptors from the shape 
matching literature to particle systems. Another promis- 
ing area to explore is the creation of new abstract order 
parameters and correlation functions, such as the phase 
space correlation function of Fig. [l3j This type of ap- 
plication may represent one of the most important uses 
for shape matching moving forward; replacing the human 
element with a computer algorithm to explore parameter 
space has the potential to greatly expedite self-assembly 
research. 
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